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Abstract 

Recent work has discussed the importance of multiplicative closure for the Markov mod- 
els used in phylogenetics. For continuous-time Markov chains, a sufficient condition for 
multiplicative closure of a model class is ensured by demanding that the set of rate- matrices 
belonging to the model class form a Lie algebra. It is the case that some well-known Markov 
models do form Lie algebras and we refer to such models as "Lie Markov models" . However 
it is also the case that some other well-known Markov models unequivocally do not form 
Lie algebras (GTR being the most conspicuous example). 

In this paper, we will discuss how to generate Lie Markov models by demanding that 
the models have certain symmetries under nucleotide permutations. We show that the 
Lie Markov models include, and hence provide a unifying concept for, "group-based" and 
"equivariant" models. For each of two, three and four character states, the full list of 
Lie Markov models with maximal symmetry is presented and shown to include interesting 
examples that are neither group-based nor equivariant. We also argue that our scheme 
is pleasing in the context of applied phylogenetics, as, for a given symmetry of nucleotide 
substitution, it provides a natural hierarchy of models with increasing number of parameters. 
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1 Introduction 



Continuous-time Markov chains are fundamental to the implementation of, and philosophy be- 
hind, many phylogenetic methods. Likelihood and Bayesian phylogenetic methods usually pro- 
ceed by attempting to fit a single "rate-matr ix" globally ac ross a proposed evolutionary tree 
history (see, for example, Chapters 2 and 3 of Gascuell (2005)). These rate-matrices are chosen 
from some restricted class or "model" that is defined by a certain set of constraints on the el- 
ements of a generic rate-matrix. These constraints define a set of free parameters that usually 
correspond to unknown evolutionary quantities such as base composition, mutation rates and the 
timing of speciation events (these last two are often by necessity confounded together simply as 
"edge lengths"). Even in phylogenetic distance methods, it is usually the case that the theoreti- 
cal justification of a given distance estimator is taken from a cont inuous-time Markov model (for 
example the general Markov model f or the "log-det" (ISteell . [l99J) distance or the HKY distance 
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taken from its corresponding model (jFelsenstein . 

A homogeneous Markov chain satisfies the condition that the probability transition rates are 
constant in time. In the phylogenetic context this means that the rates are unchanged throughout 
evolutionary history. Of course, this is used as an approximation to biological reality wher e it is 
well documented that transition rates are not only time-depen dent (iHo et al\, 20051 2007ft . but 
also vary across the different lineages of the evolutionary tree ( Lockhart et a/.l. ll998 ). Methods 
to cope with these issues have been explored by various authors: TufHev fc Steell (1997) proposed 
the "co varion" model whe r e a sw itching process allows sites to alternate between "on" and "off" 
states. Drummond et al. (2006), proposed a method that introduces an overall scaling factor 
for the transition rat es that i s samp led randomly (at branching events, for example), and the 
methods presented in IWhelanl ( 2008) are more general still with a swit ching process that allows 
for alteration of individual rates. The simulation package discussed in iFletcher fc Yang] ( 2009ft 
provides further evidence that these issues are of ongoing importance to phylogenetic analysis. 

Our philosophy is to remain agnostic as to whether evolutionary rates have changed in the 
past or, indeed, whether it is possible to statistically detect this change via analysis of present 
day molecular data. We follow an approach that allows for the biological possibility that there 
is likely to have been a smooth (or even abrupt) change of each individual transition rate in- 
dependently occurring across the evolutionary tree (and not necessarily restricted to branching 
events). This discussion leads naturally to confronting the possibility (at least theoretically) 
that the phylogenetic process is not homogeneous and is more accurately modelled as an in- 
homogeneous continuous-time Markov chain; where the rate-matrix is far from constant and 
ultimately is allowed to vary, smoothly or otherwise, as a function of edge length parameters of 
the evolutionary tree. 

Of course, given the bias/ variance tradeoff of statistical analysis (jBurnham fc Anderson! . 

20021 ). modelling phylogenetic evolution as a inhomogeneous process is statistically implausi- 



ble in practice (we would effectively be replacing a small number of parameters b y an infinite 
continuum). Indeed this is where the methods discussed in lDrummond et all ( 2006ft . where rates 
may change but only at branching events, can be seen as somewhat of an intelligent compromise 
between a (statistically tractable) homogeneous model and a (biologically realistic) inhomoge- 
neous model. Another approach would be to abandon the continu ous-time hypothesis and wo rk 
with discrete Markov chains (or equivalently "algebraic" models ( Pachter fc Sturmfelsl . 2005ft ). 
However this approach introduces many free parameters and suffers from a lack of interpretation, 
as it is unclear what the free algebraic parameters mean in biological terms (such as divergence 
times and molecular rates), except with reference to the corresponding continuous-time approach. 

An available resolution of these issues is to observe that it is possible to continue to model 
phylogenetic processes as being homogeneous, but interpret the transition rates that are fitted 
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globally across the tree (or at least non-locally) as a kind of "average" of the true inhomogeneous 
process. It is this perspective that we take in this work and it leads directly to the concept of 
multiplicative closure for continuous-time Markov chains. It will be shown that models that are 
multiplicatively closed have the property that, even in their inhomogeneous formulation, it is 
possible to interpret their average behaviour as a homogeneous process. It is then the purpose of 
this article to discuss sufficient conditions for multiplicative closure of continuous-time Markov 
chains. In order to generate particular examples of closed models, we exploit symmetry properties 
of DNA substitution rates to present a scheme that creates a hierarchy of closed Markov models 
based on the number of free parameters available. 

In Sj2]we give basic definitions of multiplicative closure and Lie Markov models. To achieve 
this we review the required Lie theory, and we discuss the Lie algebra of the general Markov 
model. As an example to motivate the general procedure, in JJ] we specialize to the case of 
binary Markov chains and give a complete description of Lie Markov models in this case. In 
21 we discuss the symmetry properties of Markov models and explain how symmetry can be 
used to assist in the sear c h for Lie Markov models. Here we also prove that equivariant (see 
Draisma fc Kuttler ( 2008 ): Casanellas fc Fernandez-Sanchez ( 2010() ) and group-based models are 



examples of Lie Markov models. In f|5]we give a general scheme for generating a full list of Lie 
Markov models with a given symmetry property, In ^S] we explicitly give four state Lie Markov 
models with maximal symmetry. Finally, Sj7] discusses implications and possibilities for future 
work. 



2 Lie algebras and closure of Markov models 

For algebraic simplicity we work over the complex field C, and refer to a matrix as "Markov" if 
it has unit column sums. Later we will discuss how our discussion specializes to the stochastic 
case where the entries must be real and lie in the range [0, 1]. Rather than work directly with 
the general Markov model, we will also consider only Markov matrices that have non-zero deter- 
minant. Although this need not be the case for a general Markov matrix, it is not too stringent 
a condition as (i) the set of Markov matrices with zero determinant is of measure zero in the 
set of Markov matrices (this is because they are defined by the vanishing of a single polynomial 
function and hence lie in an ambient space of dimension one less than the set of generic Markov 
matrices), (ii) Markov matrices that arise from a continuous-time formulation have non-zero 
determinant (as we will see shortly). In any case, in the conclusions we will argue that under- 
standing Markov matrices with zero determinant becomes easier once we understand how the 
rest can be categorized. 

Let the general Markov model DJIgmm be the set ofnxn matrices with column sum 1: 

m GMM ■= {m g m„(c) : e T M = e T ) , 

where 6 is the column n- vector with all its entries equal to 1, ie. 9 T = (1,1,. ..,1). Specializing 
further, consider the subset of matrices in 9Hgmm with non-zero determinant: 

GLi(n,C) := {M G M„(C) : T M = 6> T , det(M) =^0} . 

In turn, this set of matrices includes a subset of matrices that arise by taking the exponential of 
a rate-matrix; that is, the exponential of a matrix in 

Zgmm ■= {Q e M„(C) : e T Q = T } . (1) 

We will refer to e &GMM := {e^ : Q g £gmm} as "the general rate-matrix model" and below we 
will discuss matrix exponentials in more detail (particularly their importance to Lie theory). 
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As the inverse of a Markov matrix (if it exists) is also a Markov matrix it is clear that 
GLi(n,C) is actually a subgroup of the general li near group GL (n, C), and it follows that 
GLx(n, C) and e £ ° MM are actually Lie groups (see IStillwelll ()2008h for the relevant technical 
definitions). In fact we have the isomorphi sm GLn (n, C) = A(n— 1,C) where A(n— 1,C) is 
the (complex) affine group (see for example Bakerl (J2003)) . This observation allows the gen- 
eral methods o f Lie theory to be ap plied to understanding continuous-tim e Markov mod e ls; see 
Johnson (Il985h and|M ouradl (|2004l ) for general results and discussion, and ISumner et all (2008) 
for applications to phylogenetics. 

Summarizing, we have the following set inclusions 



c GL x {n,C) c an 



GMM, 



and Lie group hierarchy 



< GLi(n,C) < GL(n,C). 



We define a Markov model 9JI by taking 371 C %Rgmm as some well defined subset of the 
general Markov model. Similarly, a rate-matrix model e £ is defined by taking £ C Zqmm as 
some well defined subset of rate-matrices drawn from the general rate-matrix model and taking 
the set of exponentials thereof (as in (JTJ). It follows immediately from these definitions that all 
rate-matrix models are Markov models. In what follows we are primarily interested in the case 
that 9Jt = e , and in this case we will abuse our terminology and refer to £ as a "model". 

Definition 2.1. A Markov model an is said to be multiplicatively closed if and only if for all 
Mi,M 2 € we also have M X M 2 € an. 

Of course, recalling that matrix multiplication is associative, this is exactly the statement 
that an forms a semigroup under matrix multiplication. 

Presently we explore the conditions for which we can expect a rate-matrix model to be closed. 
Consider an extension of standard phylogenetic models where each edge e of the tree has its own 
rate-matrix Q e chosen from some model £. We can envisage this process arising under a model 
where at eac h branching event a new set of transition rates is chosen (a generalization of the 
ideas given in Drummond et al. ( 2006( )). Now, if we remove a single taxon from our tree there is 
a standard marginalization procedure that will give us a new tree with one fewer taxa. However, 
on this new tree there will now be an edge e a b which is the join of the edges e a = (u a , v a ) and 
P-b = (ub,Vb) with Ub = v a from the original tree, that is e a b = (u a ,Vb) (see Figure [T] for an 
illustration of this). Under the marginalization, the transition matrix for the edge e a b will then 
be the product of the matrices from edge e a and et,: 



M eab =M eb M e 



where Q a ,Qb € £ and r ,T& are the corresponding edge lengths. Now, for e~ to be closed we 



require M e 



, with Q ab € £. 



The question then naturally arises, is it obvious that this will be the case no matter how we 
define our model £? To understand what could go wrong, consider two matrices X , Y g M n (C) 
and recall the classical Baker-Campbell-Hausdorff (BCH) formula ( Campbell . 1897 ): 



e x e Y 



exp(X + Y + \ [X, Y] + ± [X, [X, Y]] +...), 



where [X, Y] := XY — YX is known as the commutator (or Lie bracket) of X and Y, and 
the high e r ord er terms are all given as further commutators of commutators of X and Y (see 
Stillwelll (<2008l) for an elementary proof). This formula generalizes the extremely well-known 
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e Qab(r a +T b ) 
• 

e QaT a gQbTb 

• - • > m 

U a v a = Ub Vb 

Figure 1: M Cab = e Q-^ T -+ Tb \, with Q ab E £. 

gQlTl gQ2l"2 e QmT m 
• »• »• 

Figure 2: An inhomogeneous process. 

rule for the product of two exponentials, e x e y = e x+y , from the case of commuting variables, 
xy = yx, to the more general case of non-commuting variables. This is achieved by "correcting" 
for the non-commutivity of matrix multiplication with the addition of the further commutators. 

Considering again the case at hand, if we replace X and Y with our rate-matrices Q a and 
Qb, we see that a sufficient condition for a model £ to satisfy the required condition is for it to 
be a Lie algebra: 

T a Qa + nQb and [Q a , Qb] € £, 

for all Q a , Q b € £ and r Q , r h . 

Suppose we have a model £ that forms a Lie algebra, and suppose that we choose a sequence 
of rate-matrices from it: (Qi, Q2, ■ ■ ■ , Qm)- Taking parameters n, T2, . . . ,r m with t = t± + T2 + 
. . . + r m , we can consider the inhomogeneous process given by 

e QlTl e Q 2 T2 e Q m T m 



However, because £ forms a Lie algebra, wc can conclude, via repeated application of the BCH 
formula, that there exists a rate-matrix Q € £ that acts as a homogeneous average: 



M(t) 



Thus we see that if we restrict our attention to models that form Lie algebras, we are free to 
interpret fitting phylogenetic data as finding an average of the (possibly) true inhomogeneous 
procesfl 

For example, consider the general time reversible model (GTR) (jFelsenstein . 2004) . with rate- 
matrices Q satisfying QD{-k) = D(tv)Q t (i.e. QD(n) is symmetric) where 7r is any (column) 
distribution vecto^. 



TV =(7Tl,7r 2 , . . . ,7T„), TTi > 0, Til + 7T 2 + . . . + 7T„ = 1 , 

and D(tt) is the matrix with tt on the diagonal and zeros elsewhere. Taken over the complex 
field we can express the GTR model using the constraints 

£ G ™ = {Q e Zgmm ■ 3tt s.t. QD(tt) = D(ir)Q T } . (2) 



^^Here the technical issue arises that the BCH formula does not guarantee that the series X + Y + [X, Y] + . . . 
will actually converge. Although this issue need not concern us in t he present work becaus e there will be some 
radius of convergence for the series, we refer the concerned reader to Bl anes &: Casas (2004) f° r further analysis. 

2 We exclude TTi = in order to avoid some unimportant technicalities in what follows. 
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Notice that, because QD(tt) = D(ir)Q T implies Q m D(ir) = D(-rz)(Q m ) T for all m, the GTR 
model expressed as transition matrices satisfies exactly the same conditions: 



{M € e &GMM : 3ir s.t. MD{iz) = D(n)M T } . 
To show that the GTR model does not form a Lie algebra, we need the following result. 

Lemma 2.2. If X E M n (C) satisfies XD(v) = —D(v)X for some vector v = (vi, V2, ■ ■ ■ , v n ) T , 
then, for all i and j , either the matrix entry Xij = or d, = — Vj. 

Proof. The condition on X can be expressed in components as XijVj = —ViXij for all i and j. 
The result follows directly by considering the individual cases. □ 

From now on, we denote the nxn identity matrix as 1„ (or simply as 1 when n is understood 
or unspecified). Now, consider two rate matrices Qi and Q2 that satisfy ^ for the uniform 
distribution 7T=— 6 = — (1,1,...,1). As D(—0) = — l n , it is clear that the required condition is 
equivalent to demanding that Q\ and Q2 are symmetric. If the rate-matrices of GTR model form 
a Lie algebra, the commutator [Qi, Q2] must satisfy © for some (possibly different) distribution 
vector 7T. However, 



D(tt) [Qi, Q 2 f = £>(tt) (QxQ 2 - Q 2 Qif 



D(n) (Q2Q1 ~ Q1Q2) = -D(n) [Q U Q 2 



where the second equality follows because Q\ and Q2 are symmetric by assumption. Thus the 
GTR condition on the commutator becomes [Q\, Q2] D(tt) = —D(n) [Qi, Q 2 ], and by Lemma 



this is impossible unless [Qi,Q2] = 0. To confirm that this is not true in general, consider the 
nxn rate-matrices 



Qi 
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(where the missing entries are chosen so these matrices have zero column-sums). Clearly Q\, Q2 € 
£>gtr and [Qi, Q 2 ] = if and only if /3 = j3'. Thus we conclude that 



Result 1. The rate-matrices of the GTR model do not form a Lie 
Result 2. The GTR model is not multiplicatively closed. 

Proof. First we recall that the Perron- Frobenius theorem (see for example Berman fc Plemmonsl 
(1994)) claims that for each matrix M with strictly positive matrix entries there is exactly one 
probability vector ir satisfying Mtt — tt. We consider two Markov matrices Mi and M 2 the GTR 
condition for the uniform distibution, ie. they are symmetric: — Mi and = M2 (see 
above). If GTR is multiplicatively closed there must exist a distribution vector 7? such that the 
product M X M 2 satisfies MiM 2 D{tc) = D(Tr)(MiM 2 ) T . However, (M 1 M 2 ) T = Mj 'Mf = M 2 M X , 
thus the required condition is 



MiM 2 D(tc) = D(9)M 2 M 1 . 
Now M\M 2 and M2M1 are both Markov matrices, so they satisfy 

e T M 2 M 1 = e T , e T M 1 M 2 = o T . 



(3) 



G 



Taking the transpose we have 



MiM 2 6 = 9, M 2 M 1 9 = 9. 



(4) 



On the other hand, consider 



MiM 2 n = MiM 2 (D(9)6)) = D{n)M 2 M 1 9 

= D(n)9 = Tr. 



(5) 



Now we assume that Mi and M 2 are chosen such that M\M 2 satisfies the condition of the Perron- 
Frobenius theorem, and we see from Q and ([5]) that we must have tt = However, this implies 
that D(tt) — ^1 which from implies M\M 2 — M 2 M\. The proof is completed by finding 
particular examples of Mi and M 2 that satisfy the required conditions and MiM 2 ^ M 2 Mi. To 
this end consider the n x n Markov matrices: 



Mi = 



1 * 


a 


b 


c . 






( * 


a 


b' 


c . 
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c 


c . 


. c 




a 
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c 


c . 


c 


b 


c 
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c . 


. c 




b' 


c 


* 


c . 


c 


c 


c 


c 


* 


c 


M 2 = 


c 


c 


c 


* 


c 



\ 



) 



\ 



where a,b,c > 0, a + b + (n — 3)c < 1 (with similar for b'), and the missing entries are chosen 
so these matrices have unit column-sums. These matrices are symmetric and their product 
satisfies the conditions of the Perron-Frobenius theorem. Finally it is easy to confirm that 
MiM 2 ^ M 2 M X , as required. □ 

These results show that the GTR model is not multiplicatively closed and hence one cannot 
interpret a fit of the GTR model as a homogeneous average of a true inhomogeneous process. 
This observation poses serious questions of interpretation for any phylogenetic inferences achieved 
using the GTR model. It would be very interesting to perform an exploratory study of how much 
(or how little) the non-closure of GTR affects phylogenetic inference in practice, however this is 
outside the scope of our present discussion. 

Remark 1. The reader may object that our definition (0) of the GTR model is too general 
and may prefer to define a different copy of "the general time-reversible model" for each fixed 
distribution vector tt: 

£gtr„ := {Q G Zgmm : QD{n) - D{n)Q T } . 

After all, every rate-matrix in &gtr^ has stationary distribution tt: 

QD(tt) = D(ir)Q T => Qtt = 0, 

and thus £gtb.„ seems to be a reasonably well motivated model. However, it is shown in 
Jarvis & Sumner \20l\ ) that 2,gtr„ does not form a Lie algebra for any tt, and the proof 
of Result^ shows that GTR^ with tt the uniform distribution is definitely not closed. Thus, it 
seems unlikely that 2gtr^ is closed for any choice of tt (see below for a proof). In any case, in 
a practical context GTR is usually implemented by considering tt — (tti, tt 2 , . . . , 7r„) T as providing 
"free parameters" that are inferred using the data at hand, and we therefore argue that it is the 
more general form of GTR model (0) that is most relevant. 
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Result 3. The GTR^ model is not closed for any distribution vector ir. 



Proof. If GTR n is closed we must have M 1 M 2 D(tt) = D(7r)(MiAf 2 ) T = j D(tt)M 2 t M 1 t for all 
Mi,M 2 € GTR n . However MiD(n) = D(ir)Mf then implies that MiM 2 D(ir) = M 2 M x D{-k). 
Since ir is a distribution vector (see the definition above), we conclude that .D(7r) _1 exists, so we 
require M\M 2 — M 2 M\, ie. GTR^ is abelian. To confirm that this cannot be the case, consider 
two rate matrices Qi,Q 2 G S^gtr^, such that [Qi,Q 2 } 7^ and the two paths A(t) = e*^ 1 and 



B(t) 



„tQ 2 



Now C(s, t) = A(t)B(s)A(t)- 1 G GTRn for all s, t, but if GTR^ is abelian we have 



C(s,t) = B(s), so C{s,t) is independent of t. On the other hand 

A'(0)B'(0) - B'(0)A'(0) - [Qi, Q 2 ] ^ 0, 



d fdC(s,t) 






ds \ dt 


t=o) 


s=0 



which is a contradiction. Constructing examples of Q\ and Q 2 satisfying the required conditions 
completes the proof. Consider the n x n rate matrices 



Qi 



( * -K\Ct 7Tl/3 

7r 2 a * 

7T 3 /3 





\ 






V 00 

It is easy to show that QiD(ir) 



Q2 



( * 7Tia TTlf3' 

7r 2 a * 

7T 3 /3' 





.. 0/ \0 00 

D(ir)Qf while [Qi,Q 2 ] = if and only if (3' = (3. 



\ 






/ 



□ 



Presently we will recall some key definitions and results from elementary Lie theory, and go 
on to describe the Lie algebra associated with the general Markov m odel. For t he re ader who is 
unfam iliar with the theory, we can recommend the elementary texts IStillwelll (l2008h and iTannl 
(120051) . 



Definition 2.3. A matrix group Q is a closed subgroup of GL(n, C). 

In this definition "closed" means in the topological sense. That is, if the limit of a sequence 
of matrices in Q is non-singular, then the limit is also in Q . 

Definition 2.4. The Lie algebra of a matrix group Q is the tangent space of Q (regarded as a 
manifold) at the identity: T±(Q). That is X £ T\{Q) is equivalent to the existence of a smooth 

path A:[0,l]->g such that A(0) = 1 and A'(0) — 



dt 



X. 



With this definition it is not too hard to show that Ti(Q) is a vector space over IR, because 
X + rY is the tangent of the product of two smooth paths A(t)B(rt), with A(0) = 1, B(0) = 1, 
A'(0)=X, B'(0) = Y and r G R. Also, Ti({?) is closed under Lie brackets, because C(0) = [X, Y] 
is the tangent of the path C{t) := A{t)B'{Q)A{t)~ l G Ti(0) and C*'(0) € Ti(0) because the 
tangent space includes all its limit points. 

A standard (and powerful) tool in Lie theory is the exponential map: 



exp : X e M„(C) h-> exp(A) = 1 + X + 



2! 



X* 
3! 



which has infinite radius of convergence, so e x is defined for all X. Recall that det(e x ) = e tr ^ x \ 
so we can conclude that e x G GL(n,C) for all X G M n (C). By dehning the path A(t) :=e xt , 
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we see immediately that A(0) = 1 and ^4'(0) = X so that X s Tx(GL(n, C)). Given that X was 
chosen arbitrarily, it is clear that Tx(GL(n, C)) = M„(C). 

We must however pause to reflect that although Tx(GL(n, C)) = M„(C), it follows from Defi- 
nition HOI that Ti(GL(n, C)) is a vector space only over R. To clarify this point, observe that the 
elementary matrices {-Eyj-jj, with matrix elements (Eij) kl = SikSji, form a basis for M„(C) con- 
sidered as a complex vector space, ie. M„(C) = ({Eij}ij) c . However, by definition Ti(GL(n, C)) 
is a real vector space and we see that a basis is given by the set {Eij, iEki}i.j t k,h where i = \f—l, 
ie. T\(GL(n, C)) = ({Eij,iEki}ij,k,i) R - These observations prompt the following definition: 

Definition 2.5. The complexification of a (real) Lie algebra Ti(Q) is the complex vector space 
Ti(G) spanned by all linear combinations c\X + C2Y with X, Y 6 Tx{G) and ci,C2 G C. When 
Ti(G) = Tx(0) c as sets and T ± (G) = T ± (G) C = (X 1 ,X 2 , ■ ■ .,X k ) c , where {X,}^^ is a set of 
linearly independent tangent vectors over C, we say that {Xi}\<i<k forms a C-basis o{T±(G). 

Remark 2. It should be noted that it is not always the case that Ti(Q) = Ti(G) c ; in general this 
is only true if G is a manifold over C. Since this is the case for the matrix groups that we will 
consider in this article (by the definitions given at the start of this section), we will henceforth 
implicitly assume that complexification has been performed and that the Lie algebras we discuss 
are vector spaces over C. 



Presently we derive the Lie algebra of the general Markov model, as was first given in lJohnson 



(|1985l ). The commutator of any two elementary matrices can be checked explicitly: 

[Eij,Eu\ = EuSji — EkjSu. (6) 
We then define the elementary rate-matrices as 

Lij := Eij — Ejj, 

for every ij^j and note that 9 T Lij = for every couple (i, j) with i ^ j, so that these matrices 
are indeed rate-matrices. It is then clear that we can express any rate-matrix Q as a linear sum: 

Q ^ ^ ^ijEij, 

and we have: 

Lemma 2.6. The matrices \J-nj\i^j f orm a C-basis for the tangent space of GLi(n,C). 

Proof. General Lie theory states that the dimension of the tangent space is equal to the dimension 
of the Lie group as a manifold. Considering GLi(n, C) as a subgroup of GL(n, C) it is clear that 
the dimension of GLi(n,C) as a manifold is n(n— 1). Now, for each ij^j, is a tangent vector 
of the smooth path A^'(t) :— e Lijt £ GLi(n,C). Also, there are n{n — 1) of the Lij and they 
are obviously linearly independent. Therefore the tangent space of GL\{n, C) at the identity is 
(Lij) c . □ 

Result 4. The rate-matrices £gmm = form a Lie algebra. 

Proof. The result follows directly as a consequence of Lemma 12.61 □ 

Indeed, the Lie algebra structure of £gmm follows from © and a little manipulation with 
the elementary rate-matrices: 

[Lij,L M ] = {L a - Lji) (Sjk - 6ji) - {L kj - L tj ) (S tl - Sji) . (7) 

For convenience in subsequent calculations we record a few individual cases of these commutation 
relations where we take i, j, k, I all to be distinct: 
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[Lij,Lki] — [Lij,Lu] — 0, [Lij,Lki] = Lij — L^j, [Lij,Lji] = Lu — Lji, 
[Lij,Lftj] — Lkj — L^, [L^, Lji] — L^ — Lji. 

Of course, this is a very convenient basis for £<gmm because any "stochastic" rate-matrix Q 
(with real and non-negative off-diagonal entries) can be written as Q — a ijLij where the 

stochastic condition is simply that the coefficients oiij are real and non-negative. This prompts 
the following definition: 

Definition 2.7. A Lie algebra £ C Zgmm has a stochastic basis if there exists a basis — 
\L\,L2, ■ ■ ■ , Ld} of £ such that each L^ is a convex linear combination of the L^, i.e. — 
'Yln-Lj cv-ijLij where a,j > 0. In this case, we say that e~ is a Lie Markov model. 

In other words, each lies in the stochastic cone spanned by the vectors \Lij\i=£j- 

Definition 2.8. The dimension of a Lie Markov model e £ is the vector-space dimension of £. 

Remark 3. In most cases, this definition of dimension corresponds exactly to the number of 
"free parameters" of a given model. For example, the dimension of the general Markov model is 
n(n — 1). 



3 Binary Lie Markov models 

In this section we describe the two-state (or binary) Lie Markov models in full detail. We will see 
that there is actually a continuous infinity of one-dimensional Lie Markov models in this case, 
and it is clear that this property generalizes to more states. This will motivate us to consider the 
symmetry of the two-state models to enable us to give a classification, and demanding similar 
symmetry in the models with more states will be the key to significant progr ess in those cases 



C onsider the general r ate-matrix model on two states. As is discussed in (jJarvis fc Sumner 

my 2x2 rate-mati 

Q = aL a + /3Lp = 



201 It ISumner et all 1201 If) , any 2x2 rate-matrix can be expressed as the linear sum 

-a 13 



a -P 



where 



} o )' Lr-Ln-y o -1 

Thus the Lie algebra of GLi(2,C) can be expressed as Zgmm = (L a , Lp) c , with the non-trivial 
commutator [L a , Lp] — L a —Lp. It is clear that £gmm equals the space of all 2 x 2 rate-matrices, 
thus &gmm is the only two-state, two-dimensional Markov Lie algebra. 

In the one-dimensional case, we can choose any tangent vector L = aL a + f3Lp G £gmm, 
with fixed a and (3 not both zero, and take £ = (L) c . The Lie algebra is trivial since the 
only Lie bracket is identically zero. Additionally, two tangent vectors L = aL a + f3Lp and 
L' = a'L a + f3'Lp give the same model if (L) c — (L') c , which occurs if (a,f3) = (Ac/, A/3') for 
some complex number A ^ 0. Thus by varying (a,/3) up to this scaling equivalence we are led 
to a complex projective space (that is, CPi) of one-dimensional two-state Lie Markov models. 

Now this is not particularly satisfying as we would at least like to identify the "binary- 
symmetric model", a=/3, as a special point in the CPi continuum of models. It is clear that the 
binary-symmetric model has a certain type of symmetry that the other one-dimensional models 
do not share, and it is the exploration of this symmetry that will be of great assistance to us 
when we study models with more than two states. 
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We have been implicitly considering the finite set {1,2} as the states of the continuous- 
time Markov chain on two states. Consider the group of permutations of these two states: 
&2 = {e, (12)}. Notice that there is an action of ©2 on the 2x2 rate-matrices Q £ £gmm'- 

Q 1— > K a QI<-\ V<7 £ © 2 , 

where K a is the 2x2 permutation matrix representing a. We demand that the one-dimensional 
Lie algebra £ = (L) c is invariant under this action of ©2, ie. 

{K a LK- x ) c = (L) c , Va £ © 2 . 

Of course, the only non-trivial constraint occurs for the permutation a = (12), and we are led 
directly to the binary-symmetric model by noticing that a = ±/3 are the only solutions of the 
equation K^^LK^^ = A"( 12 ) {aL\2 + PL21) -Aq 2 ) = 01L21 + j3L\2 = fiL, with /i £ C. The 
solution a = (3 is exactly the binary-symmetric model, and we can reject the solution a — — /3 as 
this would mean that L cx L a — Lp, which would not provide a stochastic basis for (L) c . This 
is the key to understanding how the binary-symmetric model sits as a special point in the CPi 
of one-dimensional Lie Markov models. 

Further, the two-dimensional Lie algebra £gmm — (L a ,Lp) c is also invariant under the 
action of ©2- This is because K(i 2 )(L a , Lp) c K^^ — (Lp,L a ) c — (L a ,Lp) c . The fact that this 
model has ©2 symmetry equates to the fact that, as measured by the model itself, neither of the 
two states of the Markov chain is distinguished from the other. 

Result 5. In the case of two state continuous-time Markov models, there are exactly two Lie 
Markov models with ©2 symmetry: 

1. The binary- symmetric model generated by (L a + Lp) c with dimension one. 

2. The general Markov model generated by {L a ,Lp) c with dimension two. 

Any other choice of one-dimensional model demands a choice of a and /3, which from a 
practical point of view, is somewhat equivalent to using the general rate-matrix model and using 
some kind of inference to choose a and j3. Thus we find that our general approach of exploring 
Lie Markov models with a given symmetry has thus far achieved a satisfactory classification of 
two-state models. In the next section we explore this concept of model symmetry in more detail. 



4 Permutation symmetries of Markov models 

Roughly speaking, we say that a model has a symmetry if under a nucleotide permutation 
"something" doesn't change. The purpose of this section is to discuss what this something 
should be. In what follows we label nucleotides A, C, G, T with the integers 1, 2, 3, 4 respectively. 



4.1 Equivariant models 

Consider the graphical representation of the well-known Kimura 3ST model (K3ST) given in 
Figure [3] What this graph implies is that any rate-matrix chosen from the K3ST model has 
three free parameters that must be statistically inferred using the data at hand. By considering 
the graph, the most obvious symmetry that this model can have is the permutation of the 
nucleotides that leave the graph invariant. It is well known that these permutations form the 
group 

Z 2 x Z 2 = {e, (12)(34), (13)(24), (14) (23)}. 
This observation motivates the following definition first given in iDraisma fc Kuttler ( 20081) . 
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Figure 3: Graphical representation of the K3ST model. 



Definition 4.1. Given a group G < 6 n , the G- equivariant model denoted by M G is defined 
as the algebraic evolutionary model obtained when taking transition matrices M £ M„(C) such 
that K a MK~ x = M for every a <E G (here, K a means the permutation matrix corresponding to 
a). 

Remark 4. Notice that with this definition, the G-equivariant models are not Markov models. 

Definition 4.2. Given a group G < &„, we write 9Jt G = M G n GLi(n,C) and we call it the 
G-equivariant Markov model. We also write £ = {Q € £gmm '■ K^QK^ 1 = Q} for the set of 
G-equivariant rate-matrices. Notice that £ G is a vector space. 

Lemma 4.3. We have Ti(9Jt G ) = £ G . 

Proof. To prove this, we have to show that every matrix in £ G is the derivative of some smooth 
path in 3Jl G , and the converse. Now, if X € £ G , consider A{t) = e xt . Clearly, A(0) = 1 and 
A'(0)=X. Moreover, det(A(t)) = e tTr ^ ^ and if a € G, so 

K a A(t)K~ x = KAY J \x i AK- 1 =Y,\t\K a X i K-^) = 
\i>a j i>o 

This shows that the path A(t) C 9Jt G and proves one inclusion. For the converse, let A{t) be any 
smooth path in 971 with A(0) = 1. For any a € G, we have 

K a A{t)K~ 1 = A{t). 

Taking the derivative, we see that 

KvA'i^K- 1 - A'(0), 

and we infer that the tangent vector A'(0) is in £ G . From this, the other inclusion follows. □ 

Proposition 4.1. Considered as continuous-time models e z ° , the "equivariant models 7 ' are Lie 
Markov models. 

Proof. Let a e G and let X, Y e £ G . Then, we have 

K a [X, Y] K- 1 = K a (XY - YX) K' 1 = K rT XK~ 1 K rT YK a : 1 - K a YK~ 1 K a XK~ 1 

= XY -YX 
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which shows that, considered as continuous-time models, the rate-matrices of equivariant models 
form Lie algebras. □ 

To every Markov model, we associate an oriented graph with n vertices {vi, V2, ■ ■ • , v n } in 
the following way: vertices represent the possible states, one node for each state. Each ordered 
pair of vertices (uj, Vj) is connected by an arrow from Vi to Vj with a parameter otij representing 
the rate that state changes into state Vj. In this way, every single rate-matrix in the model 
provides values for these parameters, with different matrices providing different values. 

Given such a graph, say T, and a permutation a £ & n , fix an ordering in the set of vertices. 
Then there is a natural action of a on T defined by mapping vi to . Notice that this induces 
an action of © n in the set of ordered pairs of vertices 

a : (v t ,Vj) i-> (iV(j),u CT(i )), 

and we infer there is a group homomorphism 

p : & n -> ©„(„_!). 

Under the action of such a permutation, we denote the new graph by err, and T is said to be 
invariant under a if err = T. This occurs if and only if 

OHj = &cr(,i)aU) Vvi,Vj. (8) 

Proposition 4.2. // K„ is the n x n 'permutation matrix representing a £ & n in the standard 
basis, the homomorphism p : ©„ — > ©„(„„!) described above in the case of £gmm is given by: 

KcrLijK^ 1 = L,r(i) a (j). (9) 

Proof. Consider the standard basis vectors e±, e2, ■ . ■ , e n of C" with K a ei = for all a € ©4. 
It is easy to check that L^efc = 8jk(ei — e^) so i ff (,) ff (j) et = 5 a ^k(e<r(i) — efc)- On the other hand 
KaLijK^ek = K a L i: je a -i {k) = 5 ja -^(k) (e<r(i) - e fe ). Confirming that S a(j ) k = 5 ja -^(k) for all j, 
k and a £ ©4 completes the proof. □ 

Lemma 4.4. Let T be the graph associated to a Markov model £ = (L\, . . . , Ld) c , and let a £ 64 
be a permutation. Then, T is invariant under a if and only if K^LiK^ 1 — Li, Vi £ {1, 2, . . . , d}. 

Proof. Regarding the model £ as a vector subspace of &gmm , the coordinates of the rate-matrices 
in £ in the basis {Lij} are just the mutation rates ctif. 

Q = ^ otij L^. 

Now, we have 

KaQK^ 1 = ^^aijKaLijK^ 1 = y^a^L^^j) = ) a^-i^-i^Ljj. 

i=£j i^j i¥=3 

By virtue of ([5]), it becomes clear that the invariance of the graph under a is equivalent to 
the invariance of every single rate-matrix of the model under a. In particular, the invariance 
of the graph implies the invariance of {£1, . . . , Lj} and this proves one implication. The other 
implication follows by using that {L\, . . . , Lj} is a basis for £, so their invariance under a implies 
the invariance of every matrix in £. □ 

Result 6. Considered as continuous-time models e £G , the equivariant condition corresponds 
exactly to the invariance of the graph associated to the model, ie. aT — T for all a £ G. 
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4.2 Statistical considerations 



Corollary |5] nicely characterizes the equivariant models; a class of models that has allowed a 
nu mber of already ex i sting mode ls to be analyzed simultaneously (for e xample in the work 
of iDraisma fc Kuttler ( 2008 ) and Casanellas &: Fernandez- Sanchez! ( 2010[ )). However, from a 



statistical point of view, invariance of the graph is not particularly well motivated as a symmetry 
of a model. Naturally, as the parameters of a model are "free" , and hence need to be fitted using 
some statistical method and observed data, it follows that whatever inference method is used in 
practice, there will be no change to the outcome if the parameters themselves are permuted. 

To make this statement more precise, suppose F represents the maximum likelihood method 
of fitting a given (without loss of generality) two-dimensional continuous-time Markov model 
with rate-matrix Q — a,\L\ + oliLi to a binary tree T with edge weights 9 given some data 
2). Thus F is a function that returns (via optimization) maximum likelihood estimates of the 
free parameters in the model, i.e. F(D) — (di,d2,9). Now, if we permute the rate parameters 
in Q by defining Q 1 — (X2L1 + (X1L2, we claim that the corresponding modified function F' will 
return precisely the same maximum likelihood estimates as F, i.e. = (di,dt2,0)- This 

is because the difference in the two is in parameter labels only and the optimization routine 
performed to find the maximum likelihood estimates will be unaffected by this. This observation 
generalizes immediately to models with a greater number of free parameters and leads to the 
following characterization of symmetry: 

Definition 4.5. We say that a Lie Markov model £ has the symmetry of the group G < & n if 
there is a basis B& = {Li, L2, • • • , Lj} of £ such that 

a-B z := {K a L 1 K-\K tT L 2 K- 1 ,...,K a L d K- 1 }=B £ , Va e G, 

and G is the largest subgroup of 6 n with this property. 

This definition means that G acts by simply permuting the elements of a basis Bg . Crucially, 
once such a basis is fixed, there is a group homomorphism p : G < & n — > 6^, where d is the 
dimension of the model. That is, for all a £ G and 1 < i < d, we have 

where p(<r) € &d- Also, the definition means that £ is invariant when considered as a vector 
space: 

£ = (L 1; L a , . . .,L d ) c ^ a ■ £ := {K^LiK' 1 , K a UK~ x , K a L d K~ 1 ) c = £. 

However this is weaker than the condition given in the definition and therefore should not be 
seen as equivalent 

For nucleotide models with "maximal" symmetry 64, we see that, as measured by the model 
itself, there is no way of placing the nucleotides into any preferred groupings. Indeed, any 
statistical inference method using such models will return the same answer no matter how the 
nucleotides are permuted (because such a permutation can be accounted for by a corresponding 
permutation of parameters). It is somewhat surprising to learn (for the authors at least) that 
the largest symmetry of the Kimura 3ST model is 64 itself (see [JS] below). 

Result 7. The general n-state Markov Lie algebra £gmm has & n symmetry. 



3 Although the weaker vector space condition is consistent with the statistical motivations outlined above, we 
use the stronger condition given in Definition 14. 51 as it greatly simplifies the search for Lie Markov models. 
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Proof. Zgmm has basis Bqmm — {^ij\i=^j- Recalling Proposition 14.21 we see that 
c • Bgmm — {L a (i)cj(j)}i^j = {Lij}i^j = Bqmm, Vct € & n . 



□ 



4.3 Group-based models 

Before proceeding to our general scheme, we end this section by showing that the group-based 
models are examples of Lie Markov models. 

Given an abelian group G of order \G\ = n, a group-based model is defined by considering the 
n group elements as the states of a continuous-time Markov chain with non-diagonal rate-matrix 
elements [Q] ab depending only on the difference b— a (where the group operation is written as 
addition and its inverse is written as subtraction). That is, we can write [Q] ab — [Q] b _ a for all 
a,b e G. As the possible values of the differences b— a covers the whole of the group, by setting 
a = b — a we see that these models have n free parameters a a := [Q] ab = [Q] b ^ a - The interest in 
these group-based models is that they give rise to models with inherent symmetry (for example 
the Kimura 3ST model is exactly the group-based model produced by taking the abelian group 
G = Z 2 x Z 2 < © 4 ). 

Presently we will show that the group-based models form Lie Markov m odels by using stan- 
dard results from the representation theory of finite groups. We recommend ISaganl (|200ll) as an 
elementary text for the reader unfamiliar with the basic theory. 

Definition 4.6. A representation of a group G is a map p : G —> GL(V) = GL(m,C), where 
V = C m and p(gifj2) — p{gi)p{92) for all 31,(72 G G. We say that p provides an action of G on 
the vector space V, and that V forms a G-module (or simply a module when G is clear from the 
context). 

Given a rate-matrix Q taken from a group-based model, consider the matrix derivatives: 

dQ 



L„ := 



Notice that we can write Q = X^gg a <yL a , and observe that L a has matrix elements 



J clab 



1 , if b — a = a 
-1, if b- a = 
0, otherwise. 



(10) 



Definition 4.7. The regular representation of a group G with elements {o~i, o~2, ■ ■ ■ , o~ n } is defined 
by taking the n-dimensional G-module 

(G) c = (<ti,<7 2 , ■ • - ,0Vi)c = {v = uicri + v 2 a 2 + . . . + v n a n : a G C}, 

and group action defined as v H> a ■ v = V\{uux) + + . . . + v n (aa n ). 

As, from Cayley's theorem, a group acts on itself by permutation, it is clear that the regular 
representation gives rise to permutation matrices. As we did in the case of permutation groups, 
we will denote the permutation matrix corresponding to a G G by K a , so that the map v > av 
is written as 





/ 


Vl 


\ 




( 


Vl \ 






v 2 








v 2 


V = 










V 


v n 


J 




\ 
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If we label the entries of K a by the elements of the group G, it follows that 

\K 1 = / {ib - a = (T 

1 ctU \ 0, otherwise. 1 ' 

Comparing (fTTj) and ([TUf it is clear that: 

Lemma 4.8. The tangent vectors of a group-based model are given by L a = — 1 + K a . 

Observing that L a = — 1 + K a is a rate-matrix for any a €E G, Lemma 14.81 allows us to extend 
the concept of group-based models to the non-abelian case: 

Definition 4.9. Given a (possibly non-abelian) group G, the corresponding group-based model is 
given by £g := ({L^u^o)^ where L a = —1+K a and K a is the permutation matrix representing 
a in the regular representation (see Definition I4.7[) . 

Proposition 4.3. The group-based model = ({L 17 } <t <=g)c is a Lie Markov model. 

Proof. Consider the commutator of two tangent vectors arising from a group-based model: 

[L a , L a ,\ = [-1 + K a , -1 + AV] = [K a , K„>] = K aa > - K a . a = L a „, - L a , a . 

Thus the tangent vectors of a group-based model are closed under the operation of taking Lie 
brackets and hence form a Lie algebra. □ 

5 General scheme for producing Lie Markov models 

Recall that in 33] we observed that there is an CPi continuum of two-state, one-dimensional 
Lie Markov models. This is an early indication that classifying arbitrary Lie Markov models is 
a difficult task. On the one hand, the problem seems to simply be to find all sub-algebras of 
£<jmm; however it must be kept in mind that we also need to ensure that these sub- algebras 
have a stochastic basis (as in Definition 12. 1\ . We cannot use the classic al Killing-Cartan-Dynkin 
classification of semi-simple Lie algebras fsee lErdmann fc Wildon ( 20061 ) L as these results rely on 



isomorphism classes over C and this is completely incompatible with the concept of a stochastic 
basis. Thus producing a classification of Lie Markov models appears to be rather non-trivial and 
would rely on careful considerations of the geometry of the Lie bracket operation when restricted 
to a stochastic cone (cf. Definition 12 .7[) . 

However, we also learnt in <J3] that the search for Lie Markov models can be significantly 
simplified by demanding that the models have symmetry (this successively reduced an infinite 
continuum of two-dimensional models to just two special cases). In what follows we rely heavily 
on using symmetry to assist in the search for Lie Markov models. Of course, it is expected that 
the larger the symmetry we demand, the easier the analysis will be. 

5.1 Background on Group representation theory 

In what follows we recall and implement ba sic results fro m the representation theory of the 
symmetric group & n . Again, we recommend SagarJ ( 2001 ) as an excellent introduction to the 



required material. 

Recall that a partition of n is a sequence of non-negative integers A= (Aj., A2, . . . , A,.), where 
Aj > Ai+i for all 1 < i < r and J2l =1 K — n. We sometimes write A = {A™ 1 A^ 2 . . . A" s } to denote 
the partition that has rij copies of the integer Ai, 1 < i < s. For example, A = (5, 5, 4, 2, 2, 1) = 
{5 2 42 2 1} is a partition of 19. 

Recall also that a representation is said to be irreducible if it does not contain any G- sub- 
module. 
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Lemma 5.1. The irreducible representations of & n are in one-to-one correspondence with the 
partitions of n. 

We write p\ : & n — > GL(V X ) = GL(m,C) for the irreducible representation corresponding 
to the partition A and V x = C m is the module carrying the representation. In what follows, we 
will abuse notation and use the exponent notation to refer to both the partition itself and to the 
@ n -module that carries the representation p\. 

Suppose we have a representation p : & n —¥ GL(V), for some complex vector space V. As 
& n is finite, this representation is completely reducible into irreducible parts. Hence we may 
write (Maschke's theorem): 

V = (Bxc x V\ 

where the sum is over all partitions A of n, and the c\ are non-negative integers specifying 
the number of copies of the irreducible module V x that appear in the decomposition. For 
example, recall that the defining representation is given by the action of & n on the n-dimensional 
vector space C" = ({ei}i<i<„) c defined by a : <— > e a uy It is a well known result that the 
defining representation decomposes as {n} © {n — 1, 1}, where {n} is the one-dimensional trivial 
representation and the irreducible representation {n — 1,1} therefore has dimension (n — 1). 
Consider the projection operators: 

where x A is the character of the irreducible representation A. We recall that these operators 
project a given module onto its irreducible parts, i.e. Q\V = c\V . In this way we can use the 
®\ to compute the c\. 

5.2 The general procedure 

Suppose we have a Markov Lie algebra £ and a permutation group G < 6 n . We proceed 
by exploiting the action of G on £ considered first as a discrete structure, via a choice of 
basis £?£ = {L\, L2, . . ■ , L^}, and secondly as a linear structure; that is, the vector space £ = 
(Li,L 2 , . . . , L d ) £ . 

We demand that £ satisfies the conditions of Definition ^. 5l for the permutation group G. Thus 
an action of G is defined on the basis Bp = {Li, L2, ■ ■ ■ , Ld} by a € G : Li 1— > L p i a \u\ , where p is 
the homomorphism p : G — » &d- An orbit of this action is a subset B = {L ai , L a2 , . . . , L a , B , } C 
i?£ such that B is invariant: 

aB := {L p((T )( ai ),L p ( cr )( a2 ), . . . ,ip(o-){o|B|)} = S ' 

for all a € G and Z? contains no smaller subsets with this property (i.e. B is minimal). Notice 
that this defines an equivalence relation that decomposes Bp into disjoint orbits of G: 

B z = Z?iU6 2 U...Ui3 r , 

where crBi = Bi for all I and cr G G. 

Now, it is a remarkable result of group actions that, up to bijective correspondence, a complete 
list of the o rbits of a given group G can be expressed using the orbit stabilizer theorem (see 
iBogopolskl (|2008l ) for example), as follows. Consider a subgroup H < G and partition G into 
disjoint (right) cosets G/H = {eH = H, (T2H, . . . , a q H} where each <ji € G is chosen such that 
(TiH = (jjH i=j and q= |G/iJ| = |G|/|iJ|. Thus G/H is a finite set with an action of G defined 
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by a : (TiH i— » (aai)H. The orbit stabilizer theorem then says that there is a bijection of any 
orbit B with G/G x , where G x — {g G G : gx — x} is the stabilizer of some element x G B. As 
G x < G, and there are only finitely many subgroups of G, it is thus possible to give a complete 
list of the orbits of G (up to isomorphism) by simply listing all G/H with H < G. 

Secondly, we can consider Zgmm = {Q = a ijLij ■ otij G C} as a G-module by linearly 

extending the action ([§]): 

for all a G G. By virtue of Maschke's theorem, we can also decompose £gmm into irreducible 
G-modules V x C £ of the linear action of G. 

On the other hand, for each H < G we can extend the set G/H to a complex vector space 

(G/H) c = (H, a 2 H, a q H) c = {v = c x [e] + c 2 [<7 2 ] + . . . + c q [a g ] : a € C}, 

where [a] = aH, and consider this vector space as G-module via the mapping 

a : v = ci[e] + c 2 [cr 2 ] + • • • + c q [a q ] \-t v' = a[&\ + c 2 [er<r 2 ] + . . . + c q \aa q \. 

We can then compare the irreducible G-modules that occur in the decomposition of £<gmm to 
those that occur in the decomposition of (G/H) c for each H < G. Finally, we can attempt 
to construct sub-algebras £ C Hgmm with a basis Bg such that = B\ U 2? 2 U . . . U B r is a 
plausible union of orbits Bi that are consistent with the linear decomposition of Zgmm induced 
by the action of G. 

Example. Consider the action of © 2 = {e, (12)} on the two-state general Markov model con- 
sidered as the finite set Bqmm = {L12, £ 2 i}. As (12)ii 2 = L21 we see immediately that Bqmm 
contains only one S 2 -orbit (namely itself). On the other hand, the subgroups of 6 2 are the 
trivial group Hi := {e} and if 2 := 6 2 itself. We have the bijections: 

6 2 /fli := {eHx,aHx} - {{e}, {a}} ^ © 2 , (12) 

and 

6 2 /H 2 := {efli, aH^ = {{e, a}, {a, e}} = {{e, a}} h-> {e}. (13) 

Thus, by comparing the cardinality of Bqmm to these two orbits, we conclude that Bgmm — 
&2/H1 = &2, which is to say that the action of 6 2 on Bgmm is isomorphic to the action of 6 2 
on itself (cf. Cayley's theorem). 

Now, consider the action of (3 2 on the two-state general Markov Lie algebra considered as 
a complex vector space: Zgmm = (^i2,^2i)c- ^ is well known that there are exactly two 
irreducible © 2 -modules V %d and V sgn (both one-dimensional, V %d = V sgn = C), where the S 2 
action is given by 

v £ V td v-^ av = v, 
V E V sgn 1 y (jv = sgn(a)v, 

for all <7 G © 2 . The orbit &2/H1 described in ([T2l has size two and linear decomposition 
(& 2 /H 1 ) c = V ld © whereas the orbit in 03) has size one and has linear decomposition 

(& 2 /Hi) c = V ld . If we define Lid '■= L 12 + L 2 i and L sgn := L i2 — L 2i we see immediately 
that (L id ) c = V ld and (L sgn ) c = F S9 ". Thus we conclude that £gmm = (L>id)c ® (L sgn )c — 
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yid q ysgn ^ anc j we gee ^ e on \y possible two-state Lie Markov models with 62 symmetry 
are exactly the two cases given in 

These ideas can be generalized to models with a greater number of states. The general 
procedure for generating a n-state Lie Markov model, £ , with G < & n symmetry is as follows. 

1. Decompose the Lie algebra of the general Markov model into irreducible modules of G 
(Maschke's theorem): £gmm = ©a/a£ A — ®\f\V x , where A labels the irreducible G- 
module V x = £ A and the fx are integers specifying how many times each irreducible 
module occurs in the decomposition. 

2. Apply the orbit stabilizer theorem and construct the list of G-orbits, G/Hi, by working 
through the subgroups Hi < G. 

3. Extend each of the orbits linearly over C to the G-module (G/Hi) c and decompose each 
into irreducible G-modules: (G/Hi) c = (B\h^V x , where again the h£ are integers. 

4. Working up in dimension d, consider all unions of G-orbits S := (G/H\) U (G/H2) U . . . U 
{G/H q ) such that |5| = J2i<i< q \G/Hi\=d (where | • | stands for cardinality). 

5. For each S, consider its linear decomposition into irreducible G-modules: (S) c = ®\a\V x 
where a\ := + hf ^ + . . . + h x q ', and, in order to exclude unions of G-orbits that do not 
occur in the linear decomposition of £gmm as a G-module, check that a\ < fx, for each 
A. 

6. For each case thus identified, consider the vector space £ := (B\a\£ x and use explicit 
computation to check whether £ forms a Lie algebra. 

7. If £ forms a Lie algebra, attempt to show that it has a stochastic basis. 

This procedure is guaranteed to produce all Lie Markov models with symmetry G and is best 
understood by studying the examples given in the next section. We have successfully implemented 
it to determine the list of Lie Markov models in the two-state case with G = 62, the three-state 
case with G = ©3, Z3 and the four state case with G = 64, Z2 I Z2, Z2 x Z2 and Z4. In the 
Jj6]we will give a complete presentation of the four state G = 64 case, and defer presentation of 
the other cases to a future publication. However, for n "large" and G "small" , it is worth noting 
that the final two steps in the procedure become quite difficult and computationally expensive. 
Clearly, further theoretical ideas, such as those alluded to at the start of this section, are needed 
to describe Lie Markov models in their entirety. 

6 Lie Markov models with ©4 symmetry 

As we are especially interested in nucleotide evolution, we fix n = A and use the projection oper- 
ators to decompose the Lie algebra of the general Markov model into irreducible representations 
of 64. The partitions of n = 4 are {4}, {31}, {2 2 }, {21 2 } and {l 4 }. Each of these partitions 
labels an inequivalent irreducible representation of ©4 and the corresponding character table is 
given in Table [T] Notice that the first row in the character table gives the dimension of each 
representation. Notice also that there are two one-dimensional representations, namely {4}, the 
trivial representation where each permutation is mapped to the identity 1, and {l 4 }, the sign 
representation where each permutation is mapped to ±1 based on the sign of the permutation. 



19 





{4} 


{31} 


{2 2 } 


{21 2 } 


{I 4 } 




1 


3 


2 


3 


1 


[(12)] 


1 


1 





-1 


-1 


[(123)] 


1 





-1 





1 


[(12)(34)] 


1 


-1 


2 


-1 


1 


[(1234)] 


1 


-1 





1 


-1 



Table 1: The character table of ©4. The rows are the conjugacy classes and the columns are the 
irreducible characters. 

We explicitly confirm the decomposition of the defining representation for n = 4 by construct- 
ing the projection operators 

©{4} = £ E. e e 4 X {i} {o)o = ± (e + (12) + (13) + (14) + (23) + . . . + (1423) + (1432)) , 

= £ (3e - (12) - (13) - (14) - (23) - (24) - (34) - (12)(34) - (13)(24) - (14)(23) 
+ (1234) + (1243) + (1324) + (1342) + (1423) + (1432)) . 

Notice that 6{4} • e% — \ [e,\ + e-i + e% + e^) for all i and ©{31} • &\ = 04 (6ei — 2e2 — 2e3 — 2e^). 
From this we conclude that the defining representation contains both the modules {4} and {31}. 
The dimension count 4 = 1 + 3 then shows that these are the only irreducible modules occurring 
in the defining representation (or, alternatively, one may check that 6(22} • &i = 0{2i 2 } • &i = 
0{i 4 } ' e i = for all i). Hence: 

({ei}i<i< 4 > c = {4}® {31}. 

We will use this decomposition of the defining representation repeatedly in what follows. 

We would like to decompose £gmm = ({Lij}i<i=£j<i)c into irreducible modules of ©4. Recall 
that the action of ©4 is defined by a : Lij 1— > L CT (jWj) . Compare this to the action of ©4 
on the tensor product space C 4 © C 4 = ({e^ © Cj}i<i,j<4) c defined by a : ® ej 1— > Ca(i) ® 
e CT (j-). As any tangent vector L € £gmm can be expressed as L — Yli<i 7 ij<4, a ijI J ij, we see 
immediately that the action of ©4 on &gmm is isomorphic to the action on the subspace of tensors 
{ ip € C 4 ® C 4 : = 0, 1 < i < 4}. If we disregard the constraints if>a = for a moment, what 
we have is the Kronecker product of two copies of the defining representation: ie. C 4 = {4} ©{31} 
and C 4 ® C 4 = ({4} © {31}) © ({4} © {31}). Referring to Tableland appealing to orthogonality 
of irreducible characters, we find that 

C 4 © C 4 S ({4} © {31}) © ({4} © {31}) = 2{4} © 3{31} © {2 2 } © {21 2 }. 

Now the subspace spanned by the 6j © ej is itself isomorphic to the defining representation: 
({ei © ei}i<i<4) c = {4} © {31}, and this subspace must appear in the decomposition of C 4 © C 4 . 
Setting tjjii — is the same as removing this subspace from the decomposition. Thus we have: 

Result 8. The decomposition of the four state general rate-matrix model Zgmm into irreducible 
representations of ©4 is given by 

Zgmm = {4} © 2{31} © {2 2 } © {21 2 }, (14) 

where the decomposition of the dimension is given by 12 = 1 + 2x3 + 2 + 3. 
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6.1 A convenient basis 



We would now like to present an explicit basis for each module present in the decomposition 
(TBI). Consider the vector 



L 



id 



X! Lij - L u 

l<i#j<4 



'13 



L 



43 



f -3 1 1 1 \ 

I- 311 

II- 31 

V 1 1 1 -3 / 



Clearly this vector is invariant 



c • Lid — L 



id • 



and it hence spans the trivial representation: (Lid) c = {4}. Thus (Lid) c accounts for the first 
module appearing in the decomposition (|14D . 
Define the row sum vectors 



Ri 



Li i> 



r.\<ijtj<i 

and the corresponding column sum vectors 

j:l<tyj<4 

For example, we have 



#2 = 



/-I 















( Q 








1 


\ 


1 





1 


1 


, C4 = 











1 








-1 
















1 




V o 








-1 


J 




V 








-3 


/ 



R 



<r(i); 



Consider the action of 64 on each of these vectors: 

a : E4 i-» a R % = ^ L a ^ (r ^ 

o~ : Cj 1 V aCi = ^2 L <y( 3 )a{i) = C CT (i)- 

i:l<i#j<4 

Clearly these actions are isomorphic to the defining representation: a : ei 1— > e^m. Therefore, the 
(invariant) subspace generated by the row sum vectors, as well as that generated by the column 
sum vectors, is isomorphic as a 64-module to the defining representation: 

(R 1 ,R 2 ,R 3 , Ri)c - (Cud, C 3 , C 4 ) c = {4} © {31}. (15) 
Notice that these vectors are linearly independent except for the single linear relation 

Ri + R2 + R3 + Ra — C\ + C2 + C3 + C4 = Lid- 
Keeping this linear dependence in mind, we may write 

(Mc © (RuBi, i? 3 , Ri)c © (Cud, C a , C 4 )c = {4} © 2{31}, 
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and we see that we have now accounted for the first three modules occurring in the decomposition 

m- 

Referring to the graphical representation of the Kimura 3ST model given in Figure 121 we see 
that the Kimura 3ST model has a basis given by: 

Brzst — {L a ,Lj3,Lj}, 

with 

L a = L\2 + L21 + £34 + £43, 
Lp = L13 + L31 + L24 + L42, 
L 1 = L14 + L41 + L23 + £32- 

We claimed in iJHthat the symmetry of the Kimura 3ST model is all of 64. 
We now prove this: 

Result 9. The symmetry of the Kimura 3ST model is ©4. 

Proof. We consider the following set of bipartitions of the set {1,2,3,4}: 

S = {12|34, 13|24, 14|23}, 

where ij\kl := {{i,j}, {fc, I}}. Now take the bijection between these bipartitions and the three 
tangent vectors of the K3ST model given by: 

ij\kl H> Lij + Lji + L k i + Lik. (16) 

This map is well-defined and, in particular, 

12|34H.£ a , 13|24^L^, 14|23 ^ L 7 , 

and we note relations such as 12 134 = 43 1 12 1— > L43 +£34 +£12 +£21 = L a . Notice that 64 acts 
on the set S by taking 

a : ij\kl H> a(i)a(j)\a(k)a(l), 

where i,j, k, I are all different. As S is obviously invariant under this action of 64, we conclude 
that the same is true for the set {L Q ,L^,L 7 } under the bijection (|TB| . □ 

This result immediately extends linearly to show that £,K3ST = (L a , Lp, £ 7 ) c forms a module 
of S4 with dimension three, but how does this module fit into the decomposition (fl"4")) ? Note 
that 

L a + Lp + = Lid- 
so Zksst contains the trivial module (Lid) c = {4}. Referring to our decomposition (|14[) . it is 
enough to do a dimension count to see that the only possible decomposition is 

£ K35T ^{4}®{2 2 }. (17) 

Thus the Kimura 3ST model accounts for the third term in (|14[) and we are left with accounting 
for the module {21 2 }. 

To this end we define the six antisymmetric combinations 

A i:j := L i:j -L^, 1 • / • ./ • 1- 
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Referring to Table [TJ the projector onto the {21 2 } subspace is given by 

e {212} = ^E CT6 6 4 x {21 >y 

= i (3e - (12) - (13) - (14) - (23) - (24) - (34) - (12) (34) - (13)(24) 

-(14) (23) + (1234) + (1243) + (1324) + (1342) + (1423) + (1432)) . 
Applying this projector to our antisymmetric combinations, we get, for example, 

e {2 i2 } ■A 12 = ^ (2A 12 - Aia - A u + A 23 + A 2A ) , (18) 
from which it is easy to see that the general rule is 

Pij '■— !^®{2i 2 } ' Mj — 2Ay — A lk — An + Ajk + Aji, 

where i,j,k,l are all different. We can also see from this that we have (at least) three linearly 
independent relations: 

Pl2 + -Pi 3 + Pl4 = Pi 2 + p23 + P24 = Pl3 + p23 + p34 = 0, 

so the dimension of ({Py}i<i<i<4)c i s & t most three. However, as the projection (|18[) is non-zero, 
and {21 2 } is three dimensional, it must be the case that 

({Pij}i<i<j<4,)c — {21 2 }. 
Putting all of these results together: 
Result 10. The Lie algebra £gmm of the four state general Markov model can be expressed as 

&GMM = ({Pij}l<!'#j<4) c 

— ({Lid} U {P Q , Lp, P 7 } U {P,:}i<i<4 U {C,;}i<i<4 U {Pij}l<i<i<4) c , 
with linear dependencies 

Lid = L a + Lp + P 7 = R% + R 2 + R3 + i?4 = C\ + C 2 + C3 + C4, 
P12 + P1.3 + Pi4 = P12 + P23 + P24 = P13 + P23 + P34 = 0, 
and decomposition into irreducible representations of& A : 

(L id ) c S {4}, 
( J L Q ,i (3 ,i 7 ) c = {4}©{2 2 }, 

({Ri}i<i<i) c = ({Ci}i<i<i) c = i 4 } © { 31 i, 

{{Pii}i<i<j<4,) c = {21 2 }. 

We would of course like to understand the Lie algebra of £,gmm in this basis. As discussed 
at the beginning of this section, we are unfortunately bereft of theoretical results that would 
take us directly from ([7]) and Result [TU] to give the Lie algebra in this basis. Instead, resorting 
to tedious matrix computations we have found: 

Result 11. Using the basis defined by the decomposition into irreducible representations of ©4 
given in Result VUX the Lie algebra of the general Markov model ^ can be expressed as: 



[Ri, Rj] 


— Ri Rj , 




[L a ,Lp] 


= [L a , P 7 ] = 


= [Lp, P 7 ] = 0, 


[Lij\kl,Ri] 


= Rj — Ri, 




[Ci,Cj] 


= Rj — Ri - 


- P 


[Ci,Rj] 


= Sij (L^ ~~ 


Rj) , 


\pU Pjj'lfcz] 


— Rj — Ri - 


- P 



(19) 



23 



with the understanding that Pjj = —Pji, £12134 = L a , L 13 \ 2 4 = Lp and L 14 \ 2 3 = L~ f . 

This nicely gives us an alternative presentation of the Lie algebra of the general Markov 
model ([7]) in the basis that explicitly presents the decomposition into irreducible modules of 64. 

6.2 Application of the general method. 

Following the general scheme of Section 5, our task now is to identify Lie Markov models occurring 
as sub- algebras in (|19[) . In Table [5] we present the decomposition of the orbits of ©4. These are 
computed by using the orbit stabilizer theorem and projecting (&±/H) c onto the irreducible 
module V x of 64 using the projection operator Q\. These computations are quite tedious, so 
here we will simply develop the case H = Z 2 1 Z 2 as an illustrative example. 
First of all we note that there are three copies of H = Z 2 1 Z 2 in 64: 

Z 2 1 Z 2 = {e, (12), (34), (12)(34), (13) (24), (14)(23), (1324), (1423)} 

£* {e, (13), (24), (13)(24), (12)(34), (14)(23), (1234), (1432)} (20) 
S {e, (14), (23), (14)(23), (13) (24), (12) (34), (1243), (1342)}. 

Obviously each of these copies of Z 2 I Z 2 is structurally the sam^l and hence will result in an 
isomorphic action of ©4 on &4/H. Choosing the first copy of Z 2 I Z 2 in (|20l) . we have 

6 4 /Z 2 ?Z 2 ={[e],[(13)],[(14)]} 

where [er] represents the coset in 6 4 /Z 2 ; Z 2 containing the element a, so for example, 

[e] = {e, (12), (34), (12)(34), (13)(24), (14)(23), (1324), (1423)}, 
[(13)] = {(13), (123), (134), (1234), (24), (1432), (243), (142)}, 
[(14)] = {(14), (124), (143), (1243), (1342), (23), (132), (234)}. 

These cosets inherit an action of ©4 by taking a : \a'\ [c<t'], which can be extended linearly 
to a representation of ©4 by taking the module 

(6 4 /Z 2 ?Z 2 ) c = ([ e ],[(13)],[(14)]) c -C 3 , 

with action defined as follows: given a € ©4, a vector v = c\[e] + c 2 [(13)] + C3[(14)] is mapped to 
%? := a ■ v = Cl [a] + c 2 [ct(13)] + c 3 [cr(14)] . 

We would like to decompose (© 4 /Z 2 ?Z 2 ) c into irreducible modules of 64. This can be 
achieved by applying the projection operators. For example: 

= 24 Eo-eeJ '] 

= 2i(8N + 8[(13)]+8[(14)]), 

where, in the last equality, we have identified common cosets (eg. [(12)] = [e] etc.) As this 
projection is non-zero, we conclude that (6 4 /Z 2 I Z 2 ) c contains the trivial representation {4}. 
It is easy to check that 9 {4} [(13)] = {4} [(14)] = i ([e] + [(13)] + [(14)]) so in fact (& 4 /H) c 
contains {4} only once. Now, referring to the Table [TJ we have 

e {3 i } [e] = 2iE. e e 4 x {31} (^-N 

= ± (3e+ (12) + (13) + (14) + (23) + (24) + (34) -(12) (34) -(13) (24) 

-(14)(23)-(1234)-(1243)-(1324)-(1342)-(1423)-(1432))- [e] 

= 0, 

4 This is because each copy can be mapped to the others by conjugation with a permutation a £ S4. 
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where again the final equality follows by identifying common cosets. Similarly, we check that 
6/31} [(13)] = O/31} [(14)] = and we learn that (6 4 /Z 2 I Z 2 ) c does not contain the represen- 
tation {31}. In any case we could have concluded this from the outset by noting that both 
{31} and (© 4 /Z 2 I Z 2 ) c are three-dimensional and we have already accounted for one of these 
dimensions of (& 4 /H) c with the trivial representation. Referring again to Table [TJ we see that 
either (& 4 /H) c contains two copies of the sign representation {l 4 } or a single copy of {2 2 }. 
Consider 

= i (2e - (123) - (124) - (134) - (143) - (234) - (243) 

+2(12)(34) + 2(13)(24) + 2(14)(23)) • [e] 
= i(2[e]-[(13)]-[(14)]-[(13)]-[(14)]-[(14)]-[(13)]+2[e]+2[ e ]+2[e]) 
= £(8[e]-3[(13)]-3[(14)]) 

and 

e {22} [(i3)] = ^E CTe64 x {2 * V) • M 

= i (2e - (123) - (124) - (134) - (143) - (234) - (243) 
+2(12) (34) + 2(13) (24) + 2(14)(23)) • [(13)] 

= £ (2[(13)] - [(23)] - [(1324)] - [(14)] - [(34)] - [(1423)] 
-[(1243)] + 2[(1432)] + 2[(24)] + 2[(1234)]) 

= i(8[(13)]-3[(23)]-3[e]). 

Putting this together we have 

{&JH) C = ([e] + [(13)] + [(14)], 8[e] - 3[(13)] - 3[(14)], 8[(13)] - 3[(23)] - 3[e]) c 
= {4}® {2 2 }. 

Proceeding as in this example, we have produced the results summarized in Table [2l In 
particular, Table [5] gives the decomposition of the action of ©4 on (&4/H) c into irreducible 
representations for each subgroup H < ©4. The second column records how many copies of each 
subgroup H occur in ©4, with non-isomorphic copies accounted for with distinct decomposition 
in the fourth column. For example, there are two "types" of Z 2 in ©4: 

Z 2 = {e, (12)} S {e, (13)} <* {e, (14)} £* {e, (23)} £* {e, (24)} £* {e, (34)}, 

or 

Z 2 = {e, (12)(34)} = {e, (13) (24)} Si {e, (14) (23)}. 

These two types are structurally different and as a result, the corresponding spaces (&4,/H) c 
have differing decomposition into irreducible subspaces, as shown in Tabled 
Similarly, there are two "types" of Z 2 x Z 2 : 

Z 2 x Z 2 - {e, (12), (34), (12)(34)} = {e, (13), (24), (13)(24)} S {e, (14), (23), (14)(23)}, 
and 

Z 2 x Z 2 = {e, (12)(34), (13)(24), (14)(23)}. 
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H < 6 4 


Copies 


Cardinality= jjjy 


Decomposition of (6i/H) c 


Model 


{e} 


1 


24 


{4}©3{31}©2{2 2 }©3{21 2 }©{1 4 } 


- 


z 2 


6 


12 


{4}©2{31}©{2 2 }ffi{21 2 } 


GMM 


11 


3 


II 


{4} © {31} © 2{2 2 } © {21 2 } © {1} 


- 


z 3 


4 


8 


{4}©{31}©{21 2 }ffi{l 4 } 


- 


z 4 


3 


6 


{4}©{2 2 }ffi{21 2 } 




Z 2 x Z 2 


3 


6 


{4}©2{2 2 }©{1 4 } 




11 


1 


II 


{4}ffi{31}©{2 2 } 


F81+K3ST 


©3 


4 


4 


{4}ffi{31} 


F81 


z 2 ?z 2 


3 


3 


{4}ffi{2 2 } 


K3ST 




1 


2 


{4}©{1 4 } 




64 


1 


1 


{4} 


Jukes-Cantor 



Table 2: Decomposition of the orbits of ©4 into irreducible modules. 



Again, these two types have differing decomposition into irreducible subspaces, as shown in 
Tabled 

The final column in Table[5]gives the name of the Lie Markov model that has the ©4 symmetry 
defined by &4/H. Presently we will show how these Lie Markov models arise by following the 
general scheme outlined in §4] 

First we note that both the decomposition (|T4| of &gmm and the decomposition of each 
(&/t/H) c in Table [2] have exactly one copy of the trivial representation, hence we observe that 
any Lie Markov model must occur as a single orbit under the action of ©4 and as a consequence: 

Result 12. In the four state case, there are no Lie Markov models with 64 symmetry with 
dimension five, seven, nine, ten or eleven. 



Dimension One 



From Table [2] we see that there is only one abstract orbit of 64 with cardinality one. Thus, any 
orbit of cardinality one is isomorphic to ©4/64, with decomposition (©4/©4) c = {4}. As the 
general Markov model contains one copy of the trivial representation, we conclude: 

Result 13. In the four state case, there is only a single one- dimensional Lie Markov model with 
64 symmetry. 

It is immediate that the choice Zj g := (Ljdir provides a stochastic basis and this model is 
nothing but the Jukes-Cantor model ( Jukes fc Cantor . 19691) with a typical rate-matrix taking 
the form: 



= oiL id = 



( 


—3a 


a 


a 


a \ 




a 


—3a 


a 


a 




a 


a 


—3a 


a 


V 


a 


a 


a 


-3a J 



with associated graph given in FigureSJ It is also worth noting that, because ©„/©„ always has 
cardinality one, this result extends easily to the n-state case. 
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Figure 4: Graphical representation of the Jukes-Cantor model. 
Dimension Two 

From Table[2]we see that the orbit with cardinality two is &4/A4 with decomposition (©4/^4),^, = 
{4} © {l 4 }. However the general Markov model does not contain a copy of the sign representation 
{l 4 }, so we conclude immediately: 

Result 14. In the four state case, there is no two-dimensional Lie Markov model with 64 
symmetry. 



Dimension Three 



From Table [5] we see that the only orbit with cardinality three is &4/Z2 I Z2, with decomposition 
(&4/Z2 I Z 2 ) c = {4} © {2 2 }. As we saw in (fTTl) . this subspace is given by 

(L a ,L p ,L y ) c *<{4}®{2 2 }, 

with abelian Lie algebraic 

[L a ,Lfj] — [L a ,L y ] — [Lp^Lj] = 0. 
A generic rate-matrix in this model looks like 



Q = 



( * a f3 7 \ 

a * 7 P 

P 7 * a 

1 7 p a * J 



aL a + PLp + 7_L 7 , 



where * = —0-/3 — 7. This is, of course, the Kimura 3ST model (|Kimuralll981l) with associated 
graph given in Figure [3] 

Result 15. In the four state case, the only three-dimensional Lie Markov model with ©4 sym- 
metry is the Kimura 3ST model. 



Dimension Four 

From Table [5] we see that the only orbit with cardinality four is 64/63, with decomposition 
(64/©3) c = {4} © {31}. As we saw in (|T5|) . this subspace is given by either 

(R 1 ,R 2 ,R 3 ,R 4 ) C ^{4} ©{31}, 

5 We not e here that the fact t hat the Lie algebra of the Kimura 3ST model is abelian was first explicitly 
discussed in Bashford et al. (2001). 
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a 




c 

Figure 5: Graphical representation of the Felsenstein 81 model. 



(Ci,C 2 ,C3,C 4 ) c S{4}e{31}. 

By referring to (fT9")h we see that of these two possibilities only (i?i, i?2, R3, R4)c forms a Lie 
algebra: 

[Ri, Rj] = Ri — Rj. 
A generic rate-matrix in this model looks like 



Q 



/ * a a a \ 

b * b b 

c c * c 

\ d d d * J 



aRi + bR 2 + cR 3 + dR 4 



where the diagonal entries are determ i ned b y the zero column-sum condition. This is of course 
the Felsenstein 81 model ( Felsensteinl . 1981) with associated graph given in Figure [SJ Thus we 
have: 

Result 16. In the four state case, the only four- dimensional Lie Markov model with ©4 sym- 
metry is the Felsenstein 81 model. 

Dimension Six 

At cardinality six, we have the orbit 64/Z2 x Z2, but there are two non-isomorphic copies of 
Z2 x Z2 that we need to consider: 

{e, (12), (34), (12)(34)} £ {e, (13), (24), (13)(24)} S {e, (14), (23), (14)(23)} £* Z 2 x Z 2 , 



{e, (12)(34), (13)(24), (14)(23)} S Z 2 x Z 2 . 

In the first case, we have the decomposition 

(6 4 /{e, (12), (34), (12)(34)}) c <* {4} ffi 2{21 2 } © {I 4 }, 

and we see immediately that this cannot support a submodel because of the occurrence of the 
sign representation {l 4 } . In the second case, we have the decomposition 

(6 4 /{e, (12)(34), (13)(24), (14)(23)}) c S {4} © {31} © {2 2 }, 



2s 



and we see that this orbit may support a submodel. Referring to Result 1101 we have 

(L a ,Lf3,Ly, Ri, R 2 , R3, Ra)c — (L a ,Lp, L 1 , C\, C 2 , C3, Ci) c = {4} © {31} © {2 2 }, 

with the linear relations L a + Lp + L 7 = i?i + i? 2 + R3 + Ra = C\ + C 2 + C3 + C4 = L^. However, 
referring to (TT9"1) . it is clear that (L a) Lp,L^,Cx,C2,C3,C^) c does not form a Lie algebra and 
can thus be excluded. On the other hand, we see that (L a , Lp, L 1 , R\, R2, R 3 , R4,) c forms a Lie 
algebra as we have the "cross brackets" : 

[Lij\ki,Ri\ — Rj — Ri- 
We refer to this model as K'iST + F81, where because of the linear relation we have 

dim ((L a , Lp, L 7 , Ri, R 2 , R3, R4) c ) = dim ((L a , Lp, L y ) c ) + dim ((Ri, R 2 , R3, Ri) c ) - 1 

= 3 + 4-1 = 6. 

The other possibility for dimension six is the orbit given by 64 /Z4, with decomposition 
(6 4 /Z 4 ) c = {4} © {2 2 } © {21 2 }. Again referring to Result [TU1 we have 

({Pij}i<j:L a ,Lp, L 7 ) c = {4} © {2 2 } © {21 2 }. 

However this module does not form a Lie algebra, as, for example, 

[K a , P12] = 2L a + 2L P + 2L 7 - AR 2 - 2R 3 - 2R A + 2C X - 2C 2 . 

Result 17. The only six- dimensional Lie Markov model with 64 symmetry is the vector space 
sum of the Kimura 3ST and Felsenstein 81 models: (L a , Lp, L 1 , R\, R 2 , R 3 , i?4) c . 

Constructing a basis for this six-dimensional model that exhibits the permutation symmetry 
is non-trivial, but can be achieved as follows. Notice that the set of pairs: 

T:={{1,2},{1,3},{1,4},{2,3},{2,4},{3,4}}, 

forms a set of cardinality six that is invariant under the permutations a e ©4. Now con- 
sider the following (surjective) map between this set of pairs and the set of bipartions S = 
{12|34,13|24,14|23}: 

{1,2} 1 y 12|34, 
{1,3} 1 — ^ 13|24, 
{1,4} H. 14|23, 
{2,3} H> 14|23, 
{2,4} h> 13|24, 
{3,4} 1 y 12|34. 

The existence of this map motivates the construction of the tangent vectors 
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Model Dimension Lie algebra 

GMM 12 [Lij,L H ] = {L a - L jt ) {S~ k - 6 jt ) - {L kj - L tj ) (S a - S jt ) 

K3ST+F81 6 [L ij]M ,Ri] = R 3 - R l 

Felsenstein 81 4 [Ri,Rj] = Rj - Rj 

Kimura 3ST 3 [L a , Lp] = [L a , L 7 ] = [Lp, L 7 ] = 

Jukes Cantor 1 

Table 3: The complete list of four state Lie Markov models with 64 symmetry. Note that 

il2|34 : = L a , Ll3|24 : = Lp and Irl4|23 '■— L y . 



and we see that we may then take Bp8i+K3ST — {W12, W34, W13, W24, Wu, W23} as a stochastic 
basis for £fsi+K3ST- A generic rate-matrix for this model has the form 

Q : = aW 12 + aW 3i + fiW 13 + (3W 2i + jWu + W23 

/ * 2a + a + /3 + 7 2/3 + a + ^ + 7 2j_ + a + + \ 

2a + a + /3 + 7 * 27 + a + f3 + 7 2/3 + a + /3 + 7 

2/3 + a + /3 + 7 27 + a + /? + 7 * 2a + a + @ + 7 

\27 + a + /3 + 7 2(3 + a + l3 + j 2a + a + /3 + 7 * / 

where the diagonal entries are determined by the zero column-sum condition. 

The Lie algebra of this model in the above basis can be found via explicit computation. If 
ij\kl is a bipartion, then 

[W ij ,W k i] = 2(W ij -W kl ). 
Otherwise, i£ij\i'j' and kl\k'V are distinct bipartions, then 

[Wij, W kl ] = 2 {Wij - W vy ) - 2 (W kl - W kn >) • 

Dimension Eight 

From Table[5]we see that the only 64 orbit with cardinality eight is 64/Z3, with decomposition 
(©4/Z 3 ) c = {4} ® {31} © {21 2 } © {l 4 }- Again, due to the occurrence of the sign representation 
{l 4 }, we conclude that: 

Result 18. There is no eight- dimensional Lie Markov model with ©4 symmetry. 

Summarizing these results is the main outcome of this article: 

Theorem 6.1. On four character states, there are exactly five Lie Markov models with ©4 
symmetry. These models have dimension one, three, four, six and twelve, and Lie algebras as 
given in Table [KM 



7 Discussion 

In this article we have discussed closure of continuous-time Markov chains, and we have shown 
that requiring that the rate-matrices drawn from a model form a Lie algebra provides a sufficient 
condition for closure. In $5] we showed that the GTR model (which is the basis for most current 
phylogenetic studies) does not satisfy the closure condition and therefore is not a Lie Markov 
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model (see Definition 12 .71) . We showed that the general Markov model is a Lie Markov model and 
described its corresponding Lie algebra. In <JJ]we gave a complete description of the two-state Lie 
Markov models; showing that there are actually an infinite continuum of Lie Markov models in 
this case. In [j4]we gave a new charactization of the concept of the symmetry of Markov models, 
and we went on to use this characterization to assist in the search for Lie Markov models on four 
character states. 

In <JS] we outlined a general scheme that produces the complete list of Lie Markov models 
with a given symmetry. The main result of the article is then Theorem 16.11 which shows that, 
for four character states, there are exactly five Lie Markov models with maximal symmetry, ©4. 
Four of these are well known models: the Jukes-Cantor, the Kimura 3ST, the Felsenstein 81 and 
the general Markov model, and the fifth can be interpreted as the merging of the Kimura 3ST 
and Felsenstein 81 models. 

The immediate avenue for future work is to explore Lie Markov models with slightly relaxed 
symmetry. As stated at the end of $5J we have made successfully applied our general methods 
to four-state case with relaxed symmetries G < 64 in the cases of G — Z2 I Z2, Z2 x Z2 and Z4. 
The G = Z2 I Z2 is of particular interest as the copy, 

Z 2 1 Z 2 = {e, {AG), (CT), (AG)(CT), (AC)(GT), (AT)(CG), (ACGT), (ATGC)}, 

is exactly the set of nucleotide permutations that preserves partitioning into purines and pyrim- 
idines: AG\CT := {{A, G}, {C, T}}. For example 

(AG) ■ AG\CT = GA\CT = AG\CT, 
(ACGT) ■ AG\CT = CT\GA = AG\CT. 

We defer presentation of the list Lie Markov models with these relaxed symmetries to a forth- 
coming publication. 

Of course, if one were to demand that the symmetry of Lie Markov models was the trivial 
group G = {e}, this would amount to taking no particular symmetry at all. At this point our 
methods break down and one is simply back at asking for all the sub-algebras of Zqmm that have 
a stochastic basis. Unfortunately, as was discussed at the start of § El the methods outlined in 
the paper cannot be used to address the question of Lie Markov models at this level of generality. 

Our presentation of Lie Markov models for a given symmetry has very desirable properties in 
terms of model selection. For instance, the practicing biologist may wish that candidate models 
do not provide any natural groupings of nucleotides, and hence the 64 symmetry is appropriate 
and our list of five models are the appropriate models and it is then a matter of choosing how 
many free parameters are appropriate for the given data set. On the other hand, the biologist 
may wish to distinguish between purines and pyrimidines. In this discussed above, the 

(forthcoming) hierarchy corresponding to Lie Markov models with Z2 1 Z2 symmetry and ordered 
by number of free parameters would be most appropriate. 

Another avenue of interesting theoretical research is to explore expanding the definition of a 
Lie Markov model arising from a Lie algebra £ from the set e~ , to the Lie group of transition ma- 
trices whose tangent space at an arbitrary point is the Lie algebra £ (whose individual members 
need not arise as the exponential of a rate-matrix). Again powerful methods of Lie theory are 
relevant here and simple topological questions such as "Is e £ equal to the connected component 
to the identity?" are most natural. A complete analysis would seek to give an understanding of 
the geometric aspects of these Lie groups. For example, the points where the determinant of the 
transition matrices is equal to zero will form an algebraic variety that acts a boundary to the 
group. This justifies our comment at the start of <J5]that the key to understanding Markov matri- 
ces is to first understand them as Lie groups first. This would also go a long way in clarifying the 
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conn ections between the discrete Markov chains (or "algebraic models" ( Pachter fe Sturmfelsl . 
20051) 1 and the continuous-time formulation provided by Lie Markov models. 



Finally, we can see from the results in this art icle that the success of Hadamard approach 
to the Kimura 3ST model (jHendv fc Pennyl Il989i). and genera lization via Fourier analysis to 
arbitrary (abelian) "group-based" models ( Szekelv et aZ.I . Tl993h . results directly from the fact 
that the Lie algebras of these models are abelian. In Lie theory language this means that any 
representation of the K3ST Lie algebra can be decomposed into one-dimensional irreducible 
representations, ie. it is fully "diagonalizable" . It is then interesting to try to understand 
the other Lie Markov models by applying tech niques from Lie theory suc h as the sequence of 
derived sub-algebras and Cartan sub- algebras ( Erdmann fe Wilcfonl 20061). This point of view 
has si gnificant potential to generalize the "thin flattenings" of ICasanellas fc Fernandez-Sanchez 
(l201dh applied to equivariant models, although the exact connections of these two points of view 
are not apparent to the authors at this stage. 
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